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INTRODUCTICN 

The  purpose  of  research  in  sampling  surveys  is  to  find 
and  develop  more  efficient  estimates  of  population  character- 
istics with  proper  sampling  methods.   One  method  of  estima- 
tion is  said  to  be  more  efficient  than  another  if  the  vari- 
ance or  mean  square  error  of  an  estimate  with  the  first 
method  is  less  than  that  of  the  second,  provided  the  cost  of 
obtaining  the  data  and  results  are  the  sam^e  for  both. 

Most  survey  designs  take  the  selection  of  n  units  at 
random,  with  equal  probabilities  and  without  replacement 
drawn  from  a  population  of  N  units,  as  a  basic  sampling  pro- 
cedure.  It  som.etimes  happens  that  to  select  units  with  un- 
equal probabilities  will  yield  a  gain  in  efficiency.   For 
example,  such  a  procedure  may  be  found  appropriate  when  a 
'^measure  of  size"  x.^    is  known  for  all  the  units  in  the  popu- 
lation (i  =  1,  2,  •••,  N) ,  and  these  known  sizes  x,  are 
correlated  with  the  characteristics  y^  for  which  the  popula- 
tion total  Y  is  to  be  estimated.   This  criterion  was  first 
suggested  by  Hansen  and  Hurwitz  (19l|.3)  who  considered  a 
design  in  which  a  sam.ple  of  one  unit  is  drawn  with  probabili- 
ty proportional  to  size  v;ithin  each  stratum.   Horvitz  and 
Thom.pson  (19^2)  generalized  the  results  to  a  sample  of  n 
units  drawn  with  probability  proportional  to  size  and  with- 
out reolacement . 


From  the  general  formulas  given  by  Horvitz  and  Thompson 
(1952),  Yates  and  Grundy  (1953)^  Des  Raj  (1956)  and  Hartley 
and  Rao  (1962)  derived  different  sampling  procedures  in 
order  to  get  more  efficient  estimates.   There  are  some  limi- 
tations, of  varying  importance,  attached  to  all  these  methods. 
Recently,  Rao,  Hartley  and  Cochran  (1962)  introduced  a  new 
method  attempting  to  avoid  all  disadvantages  which  occurred 
in  the  previous  m.ethods  at  the  expense  of  a  slight  loss  in 
efficiency. 

The  purpose  of  this  report  is  to  describe  and  discuss, 
with  the  aid  of  numerical  examples,  these  sampling  procedures. 
The  general  formulas  given  by  Horvitz  and  Thompson  (1952)  are 
introduced.   Following  this,  the  sampling  procedures  of  Yates 
and  Grundy  (1953),  Des  Raj  (1956),  Hartley  and  Rao  (1962)  and 
Rao,  Hartley  and  Cochran  (1962)  are  introduced  successively. 
The  report  concludes  with  some  numerical  examples. 


GET^ERAL  THEORY 

Korvitz  end  Thompson  (1952)  give  an  account  of  the 
general  theory.   Suppose  a  population  consists  of  N  elements 
Yji,  72,  J  Yjvj.  A  sample  of  size  n  is  to  be  drawn  with- 
out replacement  using  probabilities  of  selection  proportional 
to  measures  of  size.  The  probability  of  selection  associated 
with  the  i^^  element  of  the  population  prior  to  the  first 
drav.'  is  denoted  by  p.j^(i  =  1,  2,  ••♦,  N)  ,  where 

This  defines  a  probability  distribution  (of  selection)  for 
the  elements  of  the  population  for  samples  of  size  one. 
This  is  sampling  without  replacem.ent  so  that  prior  to  each 
succeeding  drav/  one  m.ust  define  a  nev/  probability  distribu- 
tion for  the  remaining  elements.   For  the  m^"  draw  designate 
the  probabilities  of  selection  by  p.   where,  as  above, 

but  the  summation  nov/  extends  only  over  the  N  -  m  +  1  ele- 
ments . 

Knowing  the  probability  distributions  used  at  each 
drav;,  it  is  possible  to  com.pute  the  a  priori  probability 
that  the  i^"  element  (i.e.,  y^)  v/i  1 1  be  included  in  a  sample 
of  size  n.   This  probability  v/i  11  be  designated  by  "^ .      or 
P(yi)-   It  is  well  known  that 

£  T.J  =  £  f  (  u  ■;  _  -^  (1.1) 

<,  =  !        i=l      ^ 


rather  than  one  since  v/e  are  not  sumrring  probabilities  of 
mutually  exclusive  events,  except  for  samples  of  size  one. 

There  are  {^)    different  samples  when  n  elements  are 
drawn  without  replacement  from  a  finite  population  of  N  ele- 
ments.  It  is  assum.ed  that  at  each  stage  of  the  draw  all 
remaining  undrawn  elements  have  a  probability  greater  than 
zero  of  being  selected.   When  the  order  of  draw  is  taken 
into  account,  there  are  nl'C^^)  =  S  possible  samples  (since 
each  different  sam.ple  could  occur  in  nl  different  orders). 
Denote  s   (s  =  1 ,  2,  •  •  • ,  S)  the  s*-"  such  sample  of  size  n. 
The  probability  that  s^  will  be  drawn  is  given  by  the  pro- 
duct of  the  probabilities  of  selection  of  the  elements  in 
the  sample  considering  the  order  of  the  draw.   Thus,  if  s^ 
contains  the  elem.ents  y . ,  y.   •..,  y^^  drawn  in  that  order, 
then 

?^  (  ^r.)=  f^,     .  f -^  ...f>^^  (1.2) 

The  probability,  If^  or  p(yj),  of  including  element  y^ 
in  the  sample  plays  a  fundamental  role  in  the  theory  of 
developing  the  estimators.   For  a  sample  of  size  n,  ITi  re- 
duces to  a  summation  of  the  probabilities  associated  with 
the  nl' {^  _    ^)    =   s^-'  samples  that  contain  y^^ .  Notationally, 

where  a  specific  sample  of  size  n  which  includes  y^  is  desig- 
nated by  s  ^^^ . 

■^   n 
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The  extension  ■ 

to  the 

a  priori  probabilities  of 

including 

both  the  elements  y 

?  and  y !  in  a  sample 

of  size  n  follows 

readily;  that  is, 

) 

.''^'J 

(1.1^) 

since  there  will  he 

^'  ^n 

:  f)  =  s(^J'  : 

such  sam.pl es 

where 

s^(ij)  designates  a 

specif 

"ic  one. 

Suppose  now  thi 

at  one 

measures  a  characteristic 

Y  for  the 

n  elements  in  the  s; 

am.p  1  e . 

The  expected 

value  of  the 

;  sum  of 

the  observed  values 

of  Y  i 

,n  the  sample  ; 

Is  then 

E(|,  ^)  = 

S 

(^■^)(l  h 

K  , 

=: 

CO- 

1  J  , 

= 

N 

^  . 

(1.5) 

Mote  that  for  sample  sums. 

y^    refers  to 

the  value  for  the 

element  selected  on 

the  i^^  draw.   It  f( 

Dllows  readily  that 

^(|,».')- 

N 

(1.6) 

The  expected  value  of 

the  sum  of  cross  products 

YiYj, 

i  t   j,  is  given  by: 

^^k'i^h) 

5 

=  i 

U,), 

= 

N 

x5 

(xS/'^^J. 

M 

^ 

Pv  •  ^'  ^j 

• 

(1.7) 

Estimation  of  the  Population  Total 

Only  unbiased  linear  estimators  for  the  population  total 
(Y)  v;ill  be  considered.  Actually,  a  number  of  linear  esti- 
mators exist.   Horvitz  and  Thom.pson  (1952)  restricted  them- 
selves to  using  ,.  . '  :. 

Y  =  I-,  ^.  h 

where  n  is  the  size  of  the  sample  and  each  B.  (i  =  1,  2,  •••, 
N)  is  a  constant  to  be  used  as  a  weight  for  the  i^^  element 
v/henever  it  is  selected  for  the  sample.  Also,  the  3  coeffi- 
cients depend  on  the  particular  sample  selected. 

In  order  that  Y  be  unbiased  it  must  be  true  that 

and,  hence,  from  equation  (1.5) 

In  order  for  the  equality  of  equation  (1.8)  to  hold,  it 
is  necessary  that 

~^i    Pi    =1  for  all  i. 

Therefore, 


A     n 


is  the  only  unbised  linear  estimator  possible  for  considera- 
tion. 
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Note  that  if 

A 

Y  will  have  zero  variance. 

Variance  and  the  Estimate  of  Variance  for  Y 

A 

By  definition  the  variance  of  Y  is 

V(Y)=E(Y-Y)\ 

=  E  (  ^  ^  -  Y  ^  , 

-  E 

Using  the  results  obtained  in  equation  (1.5) »  (1«6),  and 

(1.7); 

^=1                t.T^          Q 

This  forrrula  applies  only  when  T(  •  >  0  ^or  all  1. 

A 

An  unbiased  estimator  of  the  variance  of  Y  is  also 

obtainable,  provided  n  is  greater  than  one.  Thus, 

U(Y)-  1  f  ■   ^'~?''t  £  t-l  ^'M.^^^ilil  i.i.n) 
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3oth  TT^  and  Pj  .  are  greater  than  zero  for  all  i  and  j. 

In  the  estimating  functions  (1.9),  (1.10)  and  (1.11)  it 
will  be  noticed  that  it  is  the  quantities  "If^  and  Pj  .  that 
can  be  controlled  by  the  sampler.  Assume  a  "measure  of  size" 
X.  is  known  for  all  the  units  in  the  population  (i  =  1,  2, 
•••,  N)  and  it  is  suspected  that  these  known  sizes  Xj  are 
correlated  with  the  characteristics  y^  for  which  the  popula- 
tion total  Y  is  to  be  estimated.   The  sampler  may  wish  to 
utilize  the  information  in  x^  in  assigning  the  selection 
probabilities  such  that  the  resulting  TTi  and  Pj  .  will  lead 
to  a  reduction  in  variance.   There  are  many  papers  discussing 
this  kind  of  problem.   In  the  next  three  sections  three 
sampling  procedures  v/i  1 1  be  discussed. 


METHODS  OF  YATES  AND  GRUNDY 
Description  of  the  Sampling  Procedure 

Yates  and  Grundy  (1953)  attack  the  problem  of  assigning 
selection  probabilities  as  follows.   The  first  unit  in  the 
sample  is  selected  with  probabilities  proportional  to  the 
revised  sizes  x"-'  which  are  obtained  by  an  iteration  method 
to  be  explained  in  the  next  section,  the  second  unit  with 
probabilities  proportional  to  the  remaining  revised  sizes, 
and  so  on.   It  is  possible  to  determine  revised  size  measures 
from  the  original  sizes  measures. 

To  obtain  probabilities  using  the  original  size  measures, 
let  yj;  denotes  a  characteristic  attached  to  the  i^^  unit  of 
a  finite  population  of  N  units.   Suppose  x^  is  a  known  size 
measure  related  to  the  i^^  unit.   For  convenience  in  writing 
the  formulas  we  may  replace  the  actual  measures  of  size  Xj 
by  proportions  p.  such  that  pj  =  -^ —  and  £  -f'^  =  1 .   The 

probability  of  selecting  units  i  and  j  in  that  order  is  then 


■h 


-t>.- 


The  total  probability  of  selecting  units  i  and  J  when  a 
sample  of  tv/o  units  is  taken  is  therefore 

•P/.     =.^.p.(-J -f  -i—  ^  (2.1) 


10 


The  total  probability  of  selecting  unit  i,  which  we  may 
denote  by  2pj ,  is  given  by  the  sum  of  the  probabilities  of 
selecting  unit  i  first  and  the  probability  of  selecting  unit 
i  second  after  having  selected  some  other  unit.   If 


<;=!  V  -       '"  -    ) 


we  have 

where  p{  may  be  termed  the  effective  relative  probability  of 
selection  of  unit  i.   These  probabilities  are  easily  calcu- 
lated. All  that  is  necessary  is  to  calculate  all  p^/(l  -  p.) 
and  their  sum.. 

Determine  Revised  Size  Measures 

Let  new  effective  relative  probabilities  be  denoted  by 
q^.   From  equation  (2.2),  substituting  pj  for  p^ ,  and  q^  for 
p;,  v/e  have  the  following  N  equations  for  determining  the  q.. 


1 

where 


^f;-l[i-A'-y:li— J_    ,=,,^,^...,(2.3) 


A  =z:  — ^ 


As  a  first  approximation  we  m.ay  put 
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From  equation  (2.2)  the  right  hand  side  of  equation  (2.14.) 
equals  Z'f^Vf-  •   Hence,  a  first  approximation  is  given  by 
n^odifying  equation  (2.3)  such  that 


ti 


■fi 


(2.5) 


It  will  be  necessary  to  make  small  adjustments  In  the 
(<^^]in  order  to  make  them  add  up  to  unity.   The  new  effec- 
tive relative  probabilities  of  selection  [  ^  <;  ]  can  now  be 
calculated  from  these  (^-J  in  the  same  manner  as  the  p| 
v;ere  calculated  from  p^  by  equation  (2.2). 
A  second  approximation  is  given  by 


^fi-Cll 


i%i\ 


)  . 


Therefore 


^%-X 


with  adjustment  as  before. 

Then  the  n^''  approxim.at ion  will  be 


(  %i\ 


(2.6) 
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Numerical  Example 

In  order  to  illustrate  the  practical  utility  of  the 
above  formulas  we  take  a  population  with  N  =  U  as  shown  in 
Table  1.  Assume  the  original  size  measures  are  known. 

Table  1.   Successive  Approximations  to  Required  probabilities 


Unit 


fi 


i%^\  Ct'J,     C1,;\         (fj. 


1 

.1 

.08U 

.101 

.081 

.100 

2 

.2 

.180 

.20? 

.173 

.203 

3 

.3 

.293 

.308 

.283 

.30$ 

h 

'h 

.kh3 

.382 

.U63 

.392 

Total 

1.0 

1.000 

.998 

1.000 

1.000 

Variance  and  the  Estimate  of  Variance 
of  the  Estimator  of  Population  Total 


Only  the  case  n  =  2  is  considered.   In  term.s  of  the  no- 
tation in  previous  section* 


<•  "n 


Po^ 


Then 


A       "v. 
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A  more  suitable  form  of  the  variance  may  be  derived 
from  equation  (1.10).   Ws  have 


It  is  also  easily  established  that 

|Uo  ^  ^^^  ""  "^^  "^^  )=(^-07ri.-TT;  (M-7ri)  =  -7r.(f-7r), 

Hence 

v(Y}  =  F,^  ^  I.  i^^i^,-Y'  , 

=  ^   71-.   .   _Si_  +    y      p..  ^^(^^ Z    TT^    •    — 2i- 

-  £    IT   -w-    JtLli 


—  z  c T- TT  —  p  )  r-ii —  aj^  / 


^i'^if  > 


ih 


An  unbiased  estimate  of  variance  is  given  by 


± 


TT 


IT 


15 


METHOD  OF  DES  RAJ 
Description  of  the  Sampling  Procedure 

The  following  method  is  described  by  Des  Raj  (1956)  and 

is  relatively  simple.   Out  of  the  totality  of  (§')  groups  with 
two  units  each  (this  method  is  likely  to  be  inconvenient  for 
larger  sample  size),  one  selects  one  group.   But,  the  restric- 
tion is  that  its  given  probabilities  should  be  assigned  in  an 
optimum  way;  i.e.. 


and 


p 

• 
V 

2    '^ 

0 

f 

Z 

;  ^>' 

^i  ( 

Pa  ' 

TT. 

IT  • 

(3.1) 
TTe  (  ^  =  '/  ^y  ••••  .  ^    \       (3.2) 


is  minim.ized. 

It  is  assumed  that 

where  x  is  a  "m.easure  of  size"  and  y  represents  the  charac- 
teristic of  the  population.  oL    and  A  are  constants. 
By  modif icati  on 

•  '/■'  (3.U) 
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Since 


It- 


,hen 


2   /ii 


X^._     ^v-fl.^i 


Tr, 


^j 


____,^: 

N 

2 m L 


Furthermore, 
So 


:e  tt. 

^  IT; 


=  1 


Pci  ==Tr^ 


ii 


M 


z.  p....  r^^^  ^ 


t^a- 


<r     TTc^i 


^t}     H'7 


+  o<^ 


.•^i  •  I  ^t 


.^^.JVA^ 


5 


/3     I    P. 


3- 


-T —  ^    ~^fT~  ■•"  o</0  I  I  A.  ;  + •  '/I 


(3-^) 
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H    P.. 

In  equation  (3-5),  only  2   — ^-^ is  variable.  The 


it  3   7r,¥,- 


problem  reduces  to  the  determination  of  p.,-   such  that 


(^ 


Z  Pa  =TT;  (  c=  I,  i^  ■-••,^j  ) 


jC+O 


M 


mc   >   f  ti 


(3.6) 


Numerical  Exam.pl e 

As  an  illustration  of  the  practical  utility  of  the 
m.ethod  the  three  populations  given  by  Yates  and  Grundy  (1953) 
are  taken  into  consideration.   The  object  is  to  estimate  the 
population  total  by   selecting  two  units  with  probabilities  of 
inclusion  TT  j  proportional  to  the  following  p. 

Unit      p       y. 


1 

0.1 

0.5 

2 

0.2 

1.2 

3 

0.3 

2.1 

h 

O.Ii 

3.2 

V/e 


have  to  find  p.  .  according  to  the  following  restrictions: 


PiJ^  0 


?12 

+ 

^13 

+ 

Pll. 

T^ 

0.2, 

P21 

+ 

P23 

"f 

P2I. 

= 

o.U, 

P3I 

+ 

P32 

+ 

P3I1 

= 

0.6, 

\i 

+ 

\2 

+ 

^3 

= 

0.8, 

where  p 


ij 


'Ji 
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and  G  =  12.5Pi2  +  8'3333Pi3  +  6.25piii  +  i|.l667p23  +  3.125p2]| 
+  2.0&33p-:^h    ^^  minimized. 
The  optimum  assignment  of  p. .,  obtained  by  the  simplex 
method,  is  given  in  Table  2  below. 


Table  2.   Optimum,  Assignm.ent  of  p^  : 


J. 

1 

2 

3 

h 

Total 

1 



0.0 

0.0 

0.2 

0.2 

2 

0.0 



0.2 

■    .0.2 

O.k 

3 

0.0 

0.2 



O.k 

0.6 

h 

0.2 

0.2 

O.k 



0.8 

Tot  a 

1 

0.2 

o.k 

0.6 

0.8 

2.0 

Estimates  of  the  m.ean  and  variance  of  Y  are  obtained  by- 
using  the  standard  formula  given  by  Korvitz  and  Thom.pson 
(1952).   (See  1,9  and  1.11).   Thus 
A   n  (A-  y 


ana 


Tf/ 


^V  "^^  ^ 


TT,  TT- 


,s  an  unbiased  estimator  of  the  variance  of  Y. 
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METHOD  OF  H.  0.  HARTLEY  AND  J.  N.  K.  RAO 

Description  of  the  Sampling  Procedure 

In  the  method  proposed  by  Hartley  and  Rao  (1962),  it  is 
assum.cd  that 

IT,  =  TU-f.  4  1  (I|.l) 

where 

Arrange  the  units  cf  the  population  in  a  random  order. 
Then  give  notation  j  =  1,  2,  •••,  N  to  this  random  order  and 
denote  the  progressive  totals  of  the  (np^)  in  that  order  by 

Selected  a  "random  start,"  i.e.,  select  a  "uniform 
variate"  d  with  0  ^  d  <  1.   Then  the  n  selected  units  are 

those  whose  index,  j,  satisfies 

for  some  integer  k  between  0  and  n  -  1.   Since  np.  ^   1,  every 
one  of  the  n  integers  k  =  0,  1 ,  2,  • • •  n  -  1  will  select  a 
different  sampling  unit  j. 

Numerical  Example 

Consider  a  population  of  N  =  8  units  arranged  in  a 

random  order  and  with  sizes  x   shown  in  the  second  column  of 

J 

Table  3.  A  sample  of  n  =  3  is  to  be  drawn  using  this  sampling 

( 
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procedure.   Instead  of  computing  the  quantities  npj_,  scale 
all  computations  up  by  a  factor  of  Z  ^i/    ==   """T""  =  l^o 


.=i  /^  3 


Then  compute  the  progressive  sums  of  the  Xj  and  these  are 
shovvn  in  column  3  of  Table  3  and  correspond  to  the  quantities 
M:  2.  '^i/ri         •   Then  select  a  random  integer  between  1  and 
100  and  this  corresponds  to  the  quantity  4^- Z /Cj.^  •  l^   this 
example  the  integer  turned  out  to  be  58  and  the  selection  of 
the  three  units  in  accordance  with  (ii'2)  is  shov/n  in  column 
Ii.   '.Ve  may  find  the  lines  (j)  where  the  column  lOOMs  passes 
through  the  levels  lOOd  =  58  (for  k  =  0) ,  lOOd  +  100  =  l58 
(for  k  =  1)  and  lOOd  +  200  =  258  (for  k  =  2) .   So  the  units 
j  =  2,  1;,  8  are  selected. 


Table  3.   Selection  of  n  =  3  units  from  Population  of  N  =  8 

Units  (p.p.s.) 


Unit   Number  Size  progressive   Sum 

j  X,  100  M. 


1  15  15 

2  81  ■  96 

3  26  122 

k  1|2  164 

5  20  1  Qk 

6  16  200 

7  U5  2I|5 

8  ^$  300 


k=0,    100d=58 


k=l,  100d+100=l58 


k=2,  100d+200=258 
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With  the  help  of  asymptotic  theory,  compact  expressions 
for  the  variance  of  the  estimate  of  the  population  total  are 
derived  by  H.  O.  Hartley  and  J.  N.  K.  Rao  (1962).   The  esti- 
mates of  variance  of  the  population  total  are  derived  by  them, 
too.  These  formulas  are  applicable  for  moderate  values  of  N. 

For  the  case,  n  =  2,  the  variance  of  the  estimate  of  the 
population  total   Y=  ."^  ^i/^i  ^^   given  by 


4[i/^^^ 


(1|.3) 


to  terms  of  O(N^) ,  and 
H 


to  terms  of  O(N^) . 


TT; 


Y 


) 


ih-h) 


The  estimate  of  Vj^(Y)  is 

U.(Y;  =  ri-(TT,tTr.3.±z<-±«tTr/)-i(l</ 


I  ^  ^M/Ji.-li-^* 


A 


v/h 


A 


ile  the  estimate  of  V2(Y)  is 


22 


A  A 

The  choice  between  Vi (Y)  and  V2(Y)  depends  on  the  size 

A 

of  the  population.   For  moderately  large  N,  V2(Y)  will  con- 
tribute enough  to  the  reduction  in  the  variance.   However, 

A 

for  sraller  N,  it  may  be  necessary  to  take  Vi(Y)  into  account 
For  the  general  case  n  >  2,  the  variance  becomes 


to  terms  of  O(N^)  and  this  is  estimated  by 


(i|.7) 


(U.8) 
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A  Sn-TLE  PROCEDURE  GIVEN  3Y 
RP.O,  HARTLEY  AND  COCKRAN 

Description  of  the  Sampling  procedure 

The  following  procedure  v;as  discussed  by  Rao,  Hartley 

and  Cochran  (1962).   Let  p^  be  the  probability  for  drawing 

the  t^^'  unit  in  the  first  draw  from  the  whole  population. 

For  exarriple,  suppose  we  are  sampling  with  probability  pro- 

7i   / 

portional  to  size  of  x^,  v/here  -[7^=  /^X*   "^'^^  sampling  pro- 

V 

cedure  consists  of  the  following  tv/o  stages: 

(1)  Split  the  population  at  random  into  n  groups  of 

sizes  Nj,  N2,  N3,  '",   Nj^  where  N^  +  N2  +  "•  + 

Nn  =  ^- 

(2)  Draw  a  sample  of  size  one  with  probabilities  pro- 

portional to  p..  from  each  of  these  n  groups  in- 

dependently. 

If  the  t^'"^  unit  falls  in  group  i,  the  actual  probability 

that  it  v/ill  be  selected  is   Vfj-.  where 

TT;  -2    -t't                      (5.1) 

Numerical  Example 

Take  the  same  population  of  N  =  8  units  shown  in  Table 

3.  Arrange  these  units  in  a  random  order.   Break  them  into 

three  groups  (i.e.  n  =  3)  with  Nj  =  3  units,  N2  =  3  units  and 

No  =  2  units  according  to  the  first  stage  introduced  by  this 

2k 


sampling  procedure.   Then  perform  the  procedure  in  stage  two 
to  find  If.   and  "^^Z     . 


Table  i]..   Relative  probabilities  from  Population  of  N  =  8 


Unit   Number 

i 

Size 

^t 

Relative  probability 

1 

15 

0.050 

2 

61 

0.270 

3 

26 

0.087 

k 

U2 

O.lIiO 

5 

20 

0.067 

6 

16 

0.053 

7 

U5 

0.150 

8 

55 

0.183 

Total 

300 

1.000 

Let  Nj  consists  of  units  1,  I),  and  8,  then 

"H",  =^«p,  "ft  =  0.  0^0   4  0.  140  +  0.  (83  =  0.  373 


and 


t, 


0.1J3 


0.373 


=  0.4^0 


if  unit  8  is  drawn  in  the  sample  of  n.  =  3, 


Let   Np   consists    of  units   2,    5  and   6,    then 
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TT^  =  0.  270  +  o.o(>7  t  0.  0^3  =  0.  ^'fo 


and 


^. 


0-2.70 


f^i       O.S'fO 


0.  Gt2 


if  unit  2  is  drawn  in  the  sample  of  n  =  3* 

Let  No  consists  of  units  3  and  7»  hy  the  same  method 


J 


115=  C?.  087  +  0.  l^  =  C?.  2.37 


iT, 


=  0.  hli 


■3      0.i37 

assuming  unit  7  is  drawn  from  this  group. 


Variance  of  the  Estimate  of  Y 


The  estimator  of  the  population  total  Y  is 

y  _  i-='  "^ 

XtT.  (5.2) 

v/here  the  suffixes  1,  2,  •••,  n  denote  the  n  units  selected 
from  the  n  groups  separately. 
The  variance  of  Y  is 


Now  the  estimator  of  Y  in  sampling  with  replacement  is 


A  .      >1 

Y  =  ?, 


(5.3) 


(5.U) 
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n 

where   2   denotes  the  summation  over  the  n  units  drawn  with 
replacement,  v;ith  variance 

(See  Cochran  (1963),  p.  2^3)- 

Therefore 

N  (N-1  ;  (5.6) 

From  equation  (5*6)  it  is  seen  that  V.(Y)  will  be  minimized 

if  v.'e  choose  Nj  =  N2  =  *•*  =  N  =  R.   Therefore,  if  =  R  , 

v/here  R  is  a  positive  integer,  then  N^   =  ^2  ~   "  '    ~  '^n  ~   ^' 
and  equation  (5*6)  becomes 

N  (M~0  ' 


u 


Cn--^:  -  N  ] 


M  (  N-  I  ) 


V  ( Y' ;, 


N  ~  Kin V  (  Y  ^ ) 


A 


Equstion  (5-7)  clearly  shows  the  reduction  in  the  variance  as 
compared  to  sampling  with  replacement.   If  N  is  not  a  multiple 
of  n,  v/e  have  N  =  nk  +  k  v;here  0  <  k  <  n  and  R  is  a  positive 


(5.7) 


inteaer.   Then  choose 
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and  equation  (5'6)  reduces  to 


V.CY)=fl-i^-^^^lv(Y').     (5.8, 


For  k  =  1  or  n  -  1,  equation  (5. 8)  reduces  to 


V(Y)  =  (l- 


Yl~\ 


H 


)v(y') 


The  unbiased  estimator  of  V(Y)  is 


2.    »— 


(N-I 


^^.■(4^-Y^" 


^0 

1.  t=I      '  ^ 


(5.9) 


(5.10) 


with  Ni  =  N2  = =  Np.  =  R  +  1;  N^  +  1  =  ^^k  +  2  ~ 

Analogous  to  equation  (5.9) ,  one  obtains 


•  •  •  • 


=  R. 


(5.11) 
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EXAMPLES  AND  DISCUSSION 
Numerical  Example 

In  order  to  compare  the  efficiency  of  these  different 
orocedures,  consider  the  three  populations  introduced  by 
Yates  and  Grundy  (1953).   The  three  populations  have  the 
same  set  of  p^  values  and  are  given  in  Table  $. 

Variances  for  the  five  procedures  and  the  three  popula- 
tions are  given  in  Table  6.   Variances  for  procedures  1  and 
2  are  taken  from  Des  Raj  (1956).   For  procedure  3  the  vari- 
ance is  taken  from  Hartley  and  Rao  (1962).   And  variances 
for  bf   and  5  are  obtained  from  equation  (5.7)  and  {^.$)    respec- 
tively. 

Table  5.   Three  Populations  of  Size  N  =  I; 


Number   ^i 


^  ..  Population  A   population  B   Population  C 

yi  .       yi      yi 

0.8  0.2 

l.k  0.6 

■       1.8  0.9 

2.0  0.8 


1 

0.1 

0.5 

2 

0.2 

1.2 

3 

0.3 

2.1 

h 

o.k 

3.2 
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Table  6.   Corrparative  Efficiency  of  Five  Sampling  Procedures 


Pooulation  A   pcoulation  B   Population  C 

Procedure        — ' — 

Var.    Eff.^   Var.    £ff.%   Var.    £ff.% 


1.  Bes  Raj  0.200  100.0  0.200  100.0  0.100  100.0 

2.  Yates  &  Grundy  0-323  61.9  0.269  7k'3  0.05?  H^-k 

3.  Hartley  &  Rao  0.36?  5U.5  O.367  Sk-^  0.033  333-3 

k.   Rao,  Hartley  &  0.333  60. 1  0.333  60. 1  O.O83  120.5 
Cochran 

5.  With  Replace-  0.500  UO.O  0.500  UO.O  0.125  8O.O 
ment 


Discussion 

It  is  seen  from  Table  6  that  procedures  1,  2,  3  and  I|.  are 
more  efficient  than  sampling  with  replacement. 

Des  Raj's  procedure  is  the  most  efficient  on  populations 
A  and  3  because  A  and  B  fairly  well  satisfy  the  linear  model 
y  =  (?(  +/3  X.   For  population  C  the  model  is  not  appropriate 
so  that  considerable  loss  in  efficiency  results  from  Des  Raj's 
procedure.   This  procedure  is  not  convenient  for  large  sample 
size.   It  is  usually  applied  to  the  case  n  =  2.   Though  this 
procedure  involves  heavy  computation  through  the  simplex 
method  in  linear  programm.ing,  v/e  may  use  a  computer  to  solve 
it.  V--  ..  ,  ■:  :  - 

So  far  as  efficiency  is  concerned,  procedures  2,  3  and  1| 
are  almost  the  sam.e.   procedure  l[   may  have  a  slightly  loss 
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in  efficiency  when  N  is  moderate  and  is  not  a  multiple  of  n. 

procedure  2  requires  a  cumbersome  evaluation  of  revising 
the  measures  by  an  iteration  method.   Besides,  it  is  imprac- 
tical for  use  in  the  case  n  >  2. 

procedure  3  gives  only  asymptotic  variance  for  the 
estimates  of  Y.   It  is  im.practical  when  N  is  small.   For 
large  and  moderate  size  populations  it  provides  a  convenient 
process  for  sampling  analysis. 

Procedure  b,   does  not  need  heavy  computations.   It  is 
convenient  when  either  sample  size  n  =  2  or  sample  size 
n  >  2  is  applied.   In  com.parison  to  procedure  3  this  proce- 
dure v/i  1 1 ,  in  many  situations,  lead  to  an  estimator  with  a 
slightly  larger  variance. 

Conclusion 


V/hen  a  population  approximately  satisfies  the  linear 
model  y  =  o(  -?•  ^  x  and  the  sample  size  is  n  =  2,  we  m.ay  either 
take  procedure  1  to  get  high  efficiency  or  use  procedure  I; 
for  easy  calculation. 

V/hcn  N  is  moderate  or  large  and  the  sample  size  is  n  ^  2, 
procedure  3  is  preferred. 

When  N  is  small  and  the  sample  size  is  n  >  2,  procedure 
h    is  preferred. 
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ABSTRACT 

Given  a  finite  population  of  N  units  with  values  of  a 
characteristic  represented  by  y,.  (i  =  1,  2,  •••,  N)  ,  this 
report  deals  with  the  problem  of  estimating  the  sum  of  the 
y,-'s  v/hen  measures  of  size  x.  ,  which  are  positively  corre- 
lated with  the  values  y^^,  are  known  for  all  N  units  in  the 
population.   General  theory  and  four  important  methods  of 
selecting  sampling  units  with  probabilities  proportional  to 
sizes  and  without  replacement  arc  discussed. 

The  general  theory  is  derived  by  Korvitz  and  Thompson 
(1952),  and  the  four  methods  are  given  by  Yates  and  Grundy 
(1953),  Des  Raj  (1956),  Hartley  and  Rao  (1962)  and  Rao^ 
Hartley  and  Cochran  (1962),  respectively. 

Results  from  a  number  of  num.erical  examples  indicated 
that  all  four  sampling  methods  are  more  efficient  than 
sampling  v;ith  '  replacement .   When  a  population  approximately 
satisfies  the  linear  model  ^oo^+^%and  the  sample  size  is 
n  =  2,  Des  Raj's  method  is  the  most  efficient.   However,  in 
more  general  situations,  the  method  given  by  Rao,  Hartley 
and  Cochran  (1962)  is  preferred  for  easier  calculation. 


